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5-1— i Statistical arbitrage strategies, such as pairs trading and its generalizations, rely on 

the construction of mean-reverting spreads enjoying a certain degree of predictability. 
Gaussian linear state-space processes have recently been proposed as a model for such 
spreads under the assumption that the observed process is a noisy realization of some 



hidden states. Real-time estimation of the unobserved spread process can reveal tempo- 
rary market inefficiencies which can then be exploited to generate excess returns. Building 



on previous work, we embrace the state-space framework for modeling spread processes 

o 

(yQ ' and extend this methodology along three different directions. First, we introduce time- 

dependency in the model parameters, which allows for quick adaptation to changes in the 
data generating process. Second, we provide an on-line estimation algorithm that can be 

X ■ 

■ constantly run in real-time. Being computationally fast, the algorithm is particularly suit- 

ed ' 

able for building aggressive trading strategics based on high-frequency data and may be 
used as a monitoring device for mean-reversion. Finally, our framework naturally provides 
informative uncertainty measures of all the estimated parameters. Experimental results 
based on Monte Carlo simulations and historical equity data arc discussed, including a 
co-integration relationship involving two exchange-traded funds. 
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1 Introduction 



A time series is known to exhibit mean reversion when, over a certain period of time, is 
"reverting" to a constant mean. In recent years, the notion of mean reversion has received 
a considerable amount of attention in the financial literature. For instance, there has been 
increasing interest in studying the long-run properties of stock prices, with particular atten- 
tion being paid to investigate whether stock prices can be characterized as random walks or 
mean reverting processes. If a price time series evolves as a random walk, then any shock is 
permanent and there is no tendency for the price level to return to a constant mean over time; 
moreover, in the long run, the volatility of the process is expected to grow without bound, 
and the time series cannot be predicted based on historical observations. On the other hand, 
if a time series of stock prices follows a mean reverting process, investors m ay be able to fore 



cast f utur e returns by using past inform ation. Since the seminal work of 



Fama and French 



19881 ] and 



Poterba and Summers 



19881 ]. who first documented mean-reversion in stock mar- 



ket returns during a long time hori zon, several studies have b een carried out to detect mean 
reversion in several marke t s (e.g. 



Deaton and Laroque 



19921 ] . 



(e.g. 


Chaudhuri and Wu 


2003 


]) and many asset classes (e. 


Jorion and Sweenev 


1996] 


)• 



Since future observations of a mean-reverting time series can potentially be forecasted 
using historical data, a number of studies have also exa mined the impli catio ns of mean re 



versio n on portfolio allocation and asset management; see 



Barberis 



20001 ] and 



Carcano et al 



20051 ] for recent works. Active asset allocation strategies based on mean-reverting portfo- 
lios, which generally fall under the umbrella of statistical arbitrage, have been utilized by 
investment banks and hedge funds, with varying degree of success, for several years. Possibly 
the simplest of such strategies consists of a portfolio of only two assets, as in pairs trading. 
This trading approach consists in going long a certain asset while shorting another asset in 
such a way that the resulting portfolio has no net exposure to broad market moves. In this 
sense, the strategy is often described as market neutral. Entire monographs have been writ- 
ten to illustrate how pairs trading works, how it can be impleme nted in real se t tings , and 



how i ts pe rformance has evolved in recent years (see, for instance, 



Pole 



Vidvamurthv 



20041 ] and 



20071 ]). The underlying assumption of pairs trading is that two financial instruments 



with similar characteristics must be priced more or less the same. Accordingly, the first step 
consists in finding two financial instruments whose prices, in the long term, are expected to 
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be tied together by some common stochastic trend. What this implies is that, although the 
two time series of prices may not necessarily move in the same direction at all times, their 
spread (for instance, the simple price difference) will fluctuate around an equilibrium level. 
Since the spread quantifies the degree of mispricing of one asset relative to the other one, 
these strategies are also refereed to as relative-value. If a common stochastic trend indeed 
exists between the two chosen assets, any temporary deviation from the assumed mean or 
equilibrium level is likely to correct itself over time. The predictability of this portfolio can 
then be exploited to generate excess returns: a trader, or an algorithmic trading system, 
would open a position every time a substantially large deviation from the equilibrium level 
is detected and would close the position when the spread has reverted back to the its mean. 
This simple concept can be extended in several ways, for instance by replacing one of the 
two assets with an artificial one (e.g. a linear combination of asset prices), with the pur- 
pose of exploiting the same notions of relative-value pricing and mean-reversion, although 
in different ways; s ome r elev ant work along these lines ha s been documented, among others, 



by 



Montana et al. 



20091 ] and 



Montana and Parrella 



20091 ] , who describe statistical arbitrage 



strategies involving futures contracts and exchange-traded funds (ETFs), respectively. One 
aspect that has not been fully investigated in the studies above is how to explicitly model the 
resulting observed spread. A stochastic model describing how the spread evolves over time is 
highly desirable because it allows the analyst to precisely characterize and monitor some of 
its salient properties, such as mean-reversion. Moreover, improved trading rules may be built 
around specif ic properties of the adopted spread process. 



Recently, 



Elliott et al. 



20051 ] suggested that Gaussian linear state-space processes may 



be suitable for modeling mean-reverting spreads arising in pairs trading, and described how 
such models can yield statistical arbitrage strategies. Their main observation is that the 
observed process should be seen as a noisy realization of an underlying hidden process de- 
scribing the true spread, which may capture the true market conditions; thus, a comparison 
of the estimated unobserved spread process with the observed one may lead to the discov- 
ery of temporary market inefficiencies. Based on t he additional assum ption that the model 
parameters do not vary over short periods of time, 



Elliott et al 



20051 ] suggested to use the 



EM algorithm, an iterative procedure for maximum likelihood estimation, for tracking the 
hidden process and estimating the other unknown model parameters. To make the exposition 
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self-contained, we briefly review their model in Section [21 and state under what conditions 
the stochastic process is mean-reverting. 



In this paper we build upon the model by 



Elliott et al 



20051 ] and extend their method- 



ology in a number of ways. First, in Section O we introduce time-dependency in the model 
parameters. The main advantage of this formulation is a gain in flexibility, as the model is 
able to adapt quickly to changes in the data generating process; Section T3. II further motivates 
our formulation and discusses its potential advantages. In Section [3.21 we derive new condi- 
tions that need to be satisfied for a model with time- varying parameters to be mean-reverting. 
In Section 13.31 we describe a Bayesian framework for parameter estimation which then leads 
to a recursive parameter estimation procedure suitable for real-time applications. The final 
algorithm is detailed in Section HJ an analysis and discussion on the convergence properties 
of the algorithm as well as practical suggestions on how to specify the initial values and prior 
distributions are provided. Unlike the EM algorithm, our estimation procedure also produces 
uncertainty measures without any additional computational costs. With a view on statistical 
arbitrage, in Section [4.51 we add a note discussing how pairs trading may be implemented us- 
ing the spread models proposed in this work and enumerate other important issues involved 
in realistic implementations, together with some pointers to the relevant literature. However, 
an empirical evalutation of trading strategies is beyond the scope of this work. For further 
discussions on statistical arbitrage approaches based o n mean-rev erting spreads and many 
illustrative numerical examples the reader is referred to 



Pole 



2007 ] 



In Section^ based on a battery of Monte Carlo simulations, we demonstrate that posterior 
means estimated on-line by our Bayesian algorithm recovers the true model parameters and 
can be particularly advantageous when the analysts wishes to track sudden changes in the 
mean-level of the spread and its mean-reverting behavior. For instance, real-time monitoring 
may be used to derive stop-loss rules in algorithmic trading. Two examples involving real 
historical data are given in Section 15.21 where the cointegrating relationship between a pair 
of stocks and a pairs of ETFs are discussed. Final remarks are found in Section [6] and the 
proofs of arguments in Sections 13.21 and 14.31 can be found in the appendix. 
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2 Time- invariant state-space models for mean- reverting spreads 

Throughout the paper we will assume that the trader has identified two candidate financial 

instruments whose prices are observed at discrete time points t = 1,2,... and are denoted by 

(i) 

p\ , with j = 1, 2. At any given time t, let yt denote the price spread, defined as 

, (*) a U) 

for some parameters a and f3 which are usually estimated by ordinary least squares (OLS) 
methods using historical data. It seems common practice to select the order of i and j such 
that yt yields the largest j3 and the resulting spread captures as much information as possible 
about the (linear) co-movement of the two assets. In Section 15.21 we briefly mention how 
a penalized OLS model may be used for recursive estimation of a time-varying (3. More 
generally, the observed spread yt may also be obtained in different ways or may represent 
the return process of an initial price spread. One of the two component processes {Pt} may 
even be artificially built using a linear combination of a basket of assets. For our purposes, 
the only requirement is th at the process jyt} i s assumed to be mean-reverting. 



Furthermore, following 



Elliott et al 



20051 ] . we assume that the observed spread yt is a 



noisy realization of a true but unobserved spread or state xt- The state process {xt} is defined 
such that 

x t - x t -i = a - bx t -i + e t (1) 

where < b < 2, a is an unrestricted real number and x\ is the initial state. The restriction 
< b < 2 is imposed, because otherwise {xt} is non-stationary and thus mean reversion 
has probability zero to occur. The innovation series {et} is taken to be an i.i.d. Gaussian 
process with zero mean and variance C 2 , and £ t+1 is assumed to be uncorrelated of xt, for 
t = 1,2,.... Conditions for the state process to be mean-reverting are established using 
standard arguments, as follows. First, rewrite (H|) as 

x t = a + (1 - b)x t -\ + e t 

Expanding on this, we obtain 

t-2 t-2 

x t = (l- b) t ~ 1 x 1 + a ^(1 - bf + ^(1 - byet-i 

i=0 i=0 
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Then, taking expectations and variances, 



E(x t ) = (l-b) t - 1 {E(x 1 )-^-} + 



and 



Var(x f ) = (1 - b) 2{t - l) \ Var(xi) 



1 



C 2 



l-(l-6) 2 j 1- (1-6) 2 
It is observed that, when |1 — b\ < 1, and regardless of a, limt_ +00 (l — = and 

therefore lim^oo E{xt) = a/b. Therefore, in the long run, the state process fluctuates around 
its mean level a/b. Otherwise, when |1 — b\ > 1, (1 — ft)^ 1 is unbounded and hence E(xt) 
is unbounded too. Analogously, when |1 — b\ < 1 and regardless of a, the variance Var(xt) 
converges to C 2 /{1 — (1 — b) 2 }. Conversely, if |1 — b\ > 1, the variance of xt is unbounded 
with geometric speed. It is concluded that the hidden process \xt/\ i s mea n reverting when 



1 — b lies inside the unit circle. Adopting the notation of lElliott et al 
and B = 1 — b, so that the process xt can be rewritten as 



20051 ] . we define A = a 



A + Bx t ~ 



(2) 



Without loss of generality, we postulate that {yt} is a noisy version of {xt} generated as 



y t = x t + uj t 



(3) 



where {ujt} is Gaussian white noise with variance D 2 and ujt is uncorrelated of xt, for t = 
1, 2, . . .. From ([3]), it also follows that {yt} is a mean-reverting process. 

Note that, together with an initial distribution of the state xi, equations ([2]) and ([3]) define 
a Gaussian linear state-space model with parame t ers A. B, C, D. State-space models were 



originally developed by control engineers 



Kalman 



19601 ] and are useful tools for ex pressing 



dynam ic s ystems involving unobserv ed state variables. The reader is also referred to 



19891 ] and 



West and Harrison 



Harvey 



19971 ] for book-length expositions. 



3 Time-varying dynamic models and on-line estimation 
3.1 Preliminaries 

The linear Gaussian state-space model described by equations ([2]) and ([3]) contains the un- 
known parameters A,B,C and D which need to be estimated using historical data. When the 
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parameters are known, the Ka lman filter provides a recursive procedure for estimating the 



state process xt 



Kalman . 



I960]. Full d erivations of the Kalman filter and lucid explanations 



in a Bayesian framework can be found in 



Meinhold and Singpurwalla 



19831 ] . In practice, max- 



imum likelihood estimation (MLE) of the unknown parameters is required in order to fully 
specify the model. MLE for state-space models can be routinely carried out in a missing-data 



: rame work using t 



1982] and 



Ghosh 



le EM algorithm, as first proposed in the 1980s 



3; 



19891; a detailed derivation can also be found in 



by 



Shumwav and Stoffer 



Ghahramani and Hinton 



19961 ]. In the context of pairs trading, 



Elliott et al 



20051 ] reports some simulation and 



calibration studies demonstrating that the EM algorithm provides a consistent and robust 
estimation proce dure for the model ([2])- ([3]), a nd su ggest that the finite-dimensional recursive 



filter described in 



Elliott and Krishnamurthv 



19991 ] may also be used for estimation (although 



no results were provid ed) 



Elliott et al 



20051 ] suggest to base model estimation on data points belonging to a look- 



back window of size N. A full iterative calibration procedure is then run till convergence 
every time a new data point is observed and the window has been shifted one-step ahead. 
This approach implicitly requires the analyst to select a value of N (the effective sample size) 
ensuring that, within each time window, the model parameters do not vary. The selection of 
N may be difficult without a proper model selection procedure in place to test the assumption 
that the model is locally appropriate. For instance, although a small value of iV may guar- 
antee adequacy of the model, it could also lead to notable biases in the parameter estimates. 
When N is too large, a number of factors such as special market events, persisting pricing 
inefficiencies or structural price changes may invalidate the modeling assumptions. Clearly, 
the question of how much history to use to calibrate a model and the corresponding trading 
strategy is a critical one. 

From a practical point of view, repeating the EM algorithm several times over different 
window sizes in search for an optimal window size may be computationally expensive. Even 
performing a single calibration run may not be fast enough to accommodate very aggres- 
sive trading strategies in high-frequency domains, due to the well-known slow convergence 
properties of the EM algorithm. More notably, a vanilla application of this algorithm does 
not automatically provide any measure of parameter uncertainty. Although various methods 
and modifications have been proposed in the statistical literature in this direction (see, for 



7 



instance, iMcLachlan and Krishnanl [19971 ]). the resulting methods usually introduce further 
computational complexity. 

In order to cope with these limitations, in this section we present and discuss our three 
main contributions. Firstly, we introduce more flexibility and release some of the modeling 
assumptions by allowing the model parameters to vary over time; in this way, both smooth 
and sudden changes in the data generating process (such as those created by structural price 
changes and unusual persistence of market inefficiencies) will be more easily accommodated. 
Secondly, we propose a practical on-line estimation procedure that, being non-iterative, can 
be run efficiently over time, even at high sampling frequencies, and does not inflict the burden 
of frequent re-calibration and window size selection. Ideally, a model should be able to adapt 
to changes in the data generating mechanism with minimal user intervention, and should 
be amenable to on-line monitoring so that the key parameters characterizing the underlying 
mean-reverting property can always be under continuous scrutiny. These features enable the 
trader (or trading system) to take fast dynamic decisions. Thirdly, as a result of the Bayesian 
framework proposed here for recursive estimation, measures of uncertainty extracted from 
the full posterior distribution can be routinely computed at no extra cost. These measures 
can be very informative in quantifying and assessing estimation errors, and can potentially be 
exploited to derive more robust trading strategies; see Section [5] for some practical examples. 

3.2 The proposed model 

In this section we initially propose a variation of the classic state-space model used by 



Elliott et al. 



20051 ] in which the parameters are not assumed to be constant over time. This 
modification will then force us to reconsider under which conditions the spread process is 
mean-reverting. 

First, let us rearrange the model (J2]) and ([3]) in an autoregressive (AR) form. From (|3|), 
note that xt = Vt — &t- Then, from substitution in ([2]) for t > 2, we obtain 

y t = A + Byt-i + e t (4) 

where e t = uJt — Bu> t -i + £t is distributed as a iV(0, cr 2 ), for a 2 = D 2 + B 2 D 2 + C 2 . The above 
model is an AR model of order 1 with parameters A, B and a 2 . 

We achieve time-dependence in the parameters of @ by replacing A and B with At 
and Bt, respectively, and postulating that both parameters evolve over time, according to 



some weakly stationary process. Here we consider the case of At and Bt changing over time 
via AR models, but more general time series may be considered. These choices lead to the 
specification of a time- varying AR model of order 1, or TVAR(l). Accordingly, the observed 
spread is described by the following law, 

y t = A t + Btyt-i + e t (5) 
A t = (piAt-i + fit, B t = foBt-i + vzt 

where <pi and 02 are the AR coefficients, usually being assumed to lie inside the unit circle 
so that At and Bt may be weakly stationary processes. 

Setting 6t = (A t , B t )' and F t = (1, yt-i)' , the model can be expressed in state space form, 

y t = F[Q t + e t (6) 
9 t = *e t -i + vt (7) 

with <J> = diag(0i, 02) and error structure governed by the observation error et ~ N(0, a 2 ) and 
the evolution error vector v t = {yxt,^2t)' ~ -^2(0, cr 2 Vt), where A^(-,-) denotes the bivariate 
Gaussian distribution. It is assumed that the innovation series {et} and {z^} are individually 
and mutually uncorrelated and they are also uncorrelated of the initial state vector 9±, i.e. 
E{e t e s ) = 0;E(u t u' s ) = 0;E(e t u u ) = O^e^i) = 0;E(u t e[) = 0, for any t / s, where E{.) 
denotes expectation and 6[ denotes the row vector of 0\. 

With the inclusion of a time component in the parameters A and B, we now need to revise 
the conditions under which the mean reversion property holds true. The next result gives 
sufficient conditions for the spread {yt} to be mean-reverting. 

Theorem 1. If {yt} is generated from model then, conditionally on a realized se- 

quence B\, . . . , Bt, {yt} is mean reverting if one of the two conditions apply: 

(a) (pi = 4> 2 = 1, V t = and \Bi\ < 1; 

(b) 4>i and 4>2 ^ e inside the unit circle, Vt is bounded and \B t \ < 1, for all t > to and for 
some integer to > 0. 

Some comments are in order. First we note that if At = A and Bt = B (this is achieved 
by setting (pi = 02 = 1, and by forcing the covariance matrix of v% to be zero for all t), the 
condition \B\\ = \B\ < 1 of Theorem [T] reduces to the known condition of mean reversion 
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for the static AR model, as in the previous section. In the dynamic case, when At and/or 



Bt change over time, the condition 
fashion. Following the approach of 



Bt\ < 1 enables us to check mean reversion in an on-line 



Elliott et al 



20051 ] for the AR model, we use model 



in order to obtain estimates Bt of Bt and then we check \B\\ < 1 in order to declare whether 
{yt} is mean reverting or not; in the following sections we detail the computations involved 
in the estimation of Bt. Structural changes in the level of yt are accounted through estimates 
of At, but these do not affect the mean reversion of {yt} a s At controls only the level of yt. 



For the case of At = A, this is explained in some detail in 



Elliott et al 



20051 ] and 



inform atio n on structural ch anges for cointegrated systems the reader is referred to 



for more 



Johansen 



19881 ] and 



Liitkepohl 



20061 . Chapter 6] . The following result is a useful corollary of Theorem 



m 

Corollary 1. If {yt} is generated from model (EP-(?|) with <f>\ = 1, |^| < 1) Vit = Vi2,t 
then {yt} is mean reverting if \Bt\ < 1, for all t > to, for some to > 0, where Vt = (Vij t) 



0. 



The proof of this corollary follows by combining the proofs of (a) and (b) of Theorem Q] 
(see the appendix ). Corollary [1] gives an important case, in which At = A, for all t as in 



Elliott et al 



20051 ]. but Bt changes according to a weakly stationary AR model. This can be 



used when it is expected that At will be approximately constant and benefit may be gained 
by reducing the tuning of the four parameters (pi, 4>2, 5±, 62 to tuning of two parameters 4>2,o~2- 
For a further discussion on this topic see Sections 14.41 and 

In this paper we propose ([5]) as a flexible time-varying model for the observed spread. 
However, more general time-varying autoregressive models may be used. Consider that yt is 
generated from a time-varying AR model of order d, i.e. 

d 

y t = At + ^Buyt-i + et, t>d+l (8) 

i=l 

and the time-varying AR parameters A t and Bn follow first order AR models, as 



-4, 



*>iA t -i + v\u B it = (/>i+iBi t -i + Vi+i, 



where d is the order or lag of the autoregression, the innovations et and vn are individually and 
mutually uncorrelated and they are uncorrelated with the initial states A^ and B^. Certain 
Gaussian distributions may be assumed on et and vn and on the states A t and B^. It is readily 
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seen that this model can be casted in state space form ([6]) with F t = (1, yt-i, ■ ■ ■ , yt-d)' \ Ot = 
(At, B\t, ■ ■ ■ , Bdt)' and <J> = diag(</>i, . . . , 4>d+i) (the diagonal matrix with diagonal elements 
<f>i, . . . , 4>d+i)- It is clear that model © is a special case of model (|8|) with d = 1. When 
the general model is adopted, the conditions of mean reversion of {yt} of Theorem [1] need to 
be revised, as follows. For (pi (i = 1, . . . ,d + 1) being inside the unit circle, for t > to, all 
(time-dependent) solutions of the autoregressive polynomial ip(x) = 1 — Yli=i Bux 1 must lie 
outside th e unit circle. This effectively means that after some to, {yt} is locally stationary 



Dahlhaus 



1997]]. For the remainder of this paper, we consider the situation of d = 1, i.e. 



model ©, as this is a simple and parsimonious model. 



3.3 A Bayesian framework 

We adopt a Bayesian formulation that, within the realm of conjugate analysis, allows us 



tainty. The analysis we 



Prado and Huerta 



Triantafvllopoulos 



measures ol 


uncer 


' West et al. 




1999 



2007a]]. Initially, we assume that, given the 



20021 ] . and 

observational variance a 2 , the initial state 8± follows a bivariate Gaussian distribution with 
mean vector m\ and covariance matrix a 2 P\. Also, we place an inverted gamma density prior 
with parameters n\j2 and d\/2 on a 2 . In summary, the prior structure is specified as follows 



0) 



6i\a 2 ~ N 2 (m 1 ,a 2 Pi) and a 2 ~ 7G(m/2, di/2), 

where m\,P\,ni,di are assumed known; we comment on their specification in Section 
Note that, unconditionally of a 2 , the initial state 9\ follows a Student t distribution. 

With these priors in place, the posterior distribution of 6t\(J 2 and the predictive distribu- 
tion of yt\o 2 are routinely obtained by the Kalman filter. We elaborate more on this as follows. 
First, assume that at time t — 1 the posteriors are given by 6t~i\cr 2 , y <_1 ~ N2(mt-\, a 2 Pt-i) 
and (T 2 ]^*" 1 ~ IG(nt-i/2, dt-i/2), for some mt—i, Pt-i, n-t-i and dt—i- Here the notation 
y* means that all data points observed up to time t are included. Then, writing the likeli- 
hood function (or evidence) for an observation yt as p(yt\0t,(J 2 ), an application of the Bayes 
theorem gives 



piyt^a^pjeM 2 ,^- 1 ) 



n 



It follows that the posterior density of 0t\a 2 is Gaussian, and specifically 

0t|ffV ~N 2 (m t ,a 2 P t ). 

The recurrence equations for updating mt and Pt are provided in Section HJ The probability 
density p(yt\a 2 , y <_1 ) refers to the one-step ahead forecast density, which is obtained from 



the prior p(O t \a , y l ~ ) as yt\o\y ~ N(f t ,o Qt)- Again, see Section [4] below for recursive 
equations needed to update ft and Qt- 

The posterior distribution of a 2 is also obtained by an application of the Bayes theorem, 

p(o*\yt) = p(ytk 2 .y t " 1 )p(g 2 |y t ~ 1 ) i 

which gives an inverted gamma density cr 2 ~ IG(n t /2,d t /2), depending on parameters 
nt and dt- Here yt\y t ~ 1 follows a t distribution with rit-i degrees of freedom yt|y* _1 ~ 
t(n t -i, f t ,QtS t -i), with St-i = dt-x/nt-i. 

From the density p(9t\cr 2 , y*) , the posterior distribution of 9t, unconditionally of a 2 , is 
easily obtained by integrating a 2 out; it It then follows that 9t\y l ~ t2(n t ,mt, PtSt). From 
this the (1 — 7)% marginal confidence interval of B t is 



m 2t ± t y / 2 ^P22,tSt 

where m f = (mit,m2t)', Pt = (Pij,t)ij=i,2 and t 7 denotes the 1007% quantile of the standard 
t distribution with nt degrees of freedom. The (1 — 7)% confidence interval for xt+i is 



ft+i ± * 7 /2 y FI +l R t +iFt+iS t 
and the (1 — 7)% confidence interval for yt+i is 

ft+l ± t>y/2\/ Qt+lSt 

where the recurrence relationships of -Rt+i and Qt+i are given below. 

Some references on related time series models are in order. Fr o m a frequentist perspective 



time varying AR models 



nave 



Francq and Gautier 



aeen discussed in 



20041 ] and 



Dahlhaus 



Anderson and Meerschaert 



1997] 



Francq and Zakoan 



20011 ] . 



20051 ] . Among other works, recur- 



si ye estimation of t ime v arying autoregressive processes in a n onpar ametric settin g is discussed 



m 



Moulines et al. 



20051 ] and, for non Gaussian processes, in 



Diuric et al. 



20021 ] , using parti- 



cle filters. Standard Bayesian AR models have been developed since the early 70's, see e.g. 



12 



Zellner 



1972 



Monahan 



19831 ] 



Kadivala and Karlsson 



software for model estimation is widely available 



19971 ] and 



Ni and Sun 



20031 ]. Free 



4 On-line estimation 

4.1 An adaptive and recursive algorithm using discount factors 

In this section we provide the updating equations needed to compute the posterior densities 
of #t|y* and of cx 2 ]?/ at each time step. Starting at time t = 1 with a quadruple of initial 
values (mi, Pi,n\, d\), the calibration algorithm then proceeds as follows: 

R t = $P t -i$ + V t , Q t = F[R t F t + 1, e t = y t - F^mt^ 
K t = R t F t /Q t , m t = $m t - 1 + K t e u P t = R t - K t K[Q t (10) 
r t = yt- F{m t , n t = n t -\ + 1, d t = d t _i + r t e t , S t = — 

For any t = 2, . . . , T, the above algorithm estimates the target posterior quantities of interest; 
for instance, we can extract posterior and predictive mean and variances, as well as relevant 
quantiles and credible bounds of Qt and a 2 . From Qt = {At, Bt)' and the posterior distribution 
of Qt\y l , we can extract the posterior distribution of Bt\y l ■ The condition for mean-reversion 
established in Theorem [1] can be monitored recursively by extracting the posterior mean of 
Bt\y l ' , say Bt, and assessing whether \Bt\ is strictly less than one. Credible bounds can also 
be associated to the posterior mean in order to better assess the possibility that the process 
is still mean-reverting - see the examples in Section [SJ 

The full specification of algorithm (|10p requires the selection of a covariance matrix Vt, 
which is responsible for the st ochastic evolution of the signal Qt and hence the stochastic 



change of At and Bt- Following 



West and Harrison 



19971 . Chapter 6] we advocate a practical 



and convenient analytical solution which allows us to learn this variance component directly 
from the data in a sequential way by means of two discount factors, Si and 62', this is referred 
to as component discounting. The idea is that by assuming Pi and Vt to be diagonal matrices 
we can use the two discount factors to discount the precision of the updating of the mean and 



1 Time- varying AR models are implemented in the computing language R (website: 
http://cran.r-project.org/) via the contributed package timsac. S-plus, Fortran and Matlab 
routines for the implementation of these models can be downloaded from the website of Mike West 
(http : //www. stat . duke . edu/research/ sof tware/west/tvar .html ) 
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v t 



the variance of 9t as we move from time t — 1 to t. In other words we use 8\ and 82 to specify 
the covariance matrix Vt as 

^(l-tfiWfpn.t-i 

£^(1 - 5 2 )(f>2P22,t-l 

where Pt = {pij)ij=i,2- This implies that Rt = diag(^pn ] t_i/(5i, 4>\p22,t-i/ 82) and thus, as 
we move from t — 1 to t, the prior variance of At is increased by a factor of l/<5x and of Bt by 
a factor of 1/(52. Of course if 8\ = 82 = 1, then Vt = and in this case 0t carries no stochastic 
evolution. If we allow 8\ = 1 and £2 < 1 5 then only Bt has stochastic evolution over time. 

4.2 Model comparison and model assessment 

The performance of the estimation procedure of Sections 13.31 and 141 can be form ally evaluated 



using model diagnostic and model compar ison tools; see, for instanc e, 



Li 



exposition of time series diagnostics and 



Harrison and West 



20041 ] for a general 



199ll ] for diagnostics in state 



space models. In this section we briefly discuss three diagnostic tools, namely the mean 
of the squared standardized forecast errors (MSSE), the likelihood function, and sequential 
Bayes factors. 

From the Student t distribution of y t \y l ~ l , i.e. yt|y t_1 ~ t(n t _i, ft, QtSt-i), we can define 
the standardized one-step forecast errors (or standardized residuals) as Ut = Qt ^t-i(Vt~ft), 
so that ut\y t ~ 1 ~ t(nt-i, 0, 1) (the standard t distribution with nt— 1 degrees of freedom). We 
can therefore construct diagnostics and outlier detection tools based on the above t distribu- 
tion of ut. Writing vt = (1 — 2n~[\)ut we have E(vf\y t ~ 1 ) = 1 and so the MSSE is defined as 
(T — l)" 1 X^t=2 v t i which if the model fit is good, should be close to 1. 

From the Student t distribution of yt\y l ~ l the log-likelihood function of (pi, $2,81, 82 based 
on data y T = {y 2 , ...,y T } is 

T 

<p2,8i,8 2 ;y T ) = ^p(yt|y t_1 ) 



t=2 



T(n t /2) 1 * ^ f , {y t -f t f 



^ Tint 2) 1^ 



1 + 



^im t -iT(n t -i/2) 2 ^ [ n t -iQtS t -i 

where T(.) denotes the gamma function. Model camparison can be carried out by using either 
one of the following criteria: likelihood function, Akaike's information criterion (AIC) and 
Bayesian information criterion (BIC) . In particular, we can choose optimal values of some or 
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all of the hyperparameters <pi, 4>2, S\, 82 by maximizing £(.). A discussion on the specification 
of the hyperparameters of the model can be found in Section 14.41 

For the application of the above diagnostic criteria, all data y T is needed to be available, 
or historical data can be used. However, sometimes it is useful to construct sequential diag- 
nostics so that the model can be assessed and updated over time in an adaptive way. Such 
diagnostics tools include sequential likelihood ratios and sequential Bay es factors. Here we 
jriefly discuss the latter, the foundations of which are discussed in detail in 



West and Harrison 



19971 . Chapter 11]. Suppose that, given a sample y T = {y\, . . . ,yr} we have two candidate 



models of the form of ([6]) that is they have the same structural form, but they may differ in 
the values of 0i, fa, 5\ and 62- Suppose that we denote the two models by Mi and M2 and 
for i = 1,2 we write (fru, (pi 2 , Sn and 5i2 to indicate the dependence of model Mi in these 
parameters. Then the Bayes factor of M\ versus M2 is given by the ratio of their respective 
one-step forecast densities, i.e. 

piytly^ 1 , Mi) 



r\ o 1 2 \ n t/2 / r\ o \ nt/2 

nt-iQ2tJ2,t-i + e 2t \ ( <^itoi,t-i x 



p{yt\y t ~ l ,M 2 ) \n t -iQitS 1)t -i + ef t 



.Q2tS2,t-l . 

where we have used that yt\y l ~ v ,Mi ~ t(n t -i, fu,QitSi,t-i), with the quantities f it , e it , Qu, 
Sij-i being appropriately indexed by i = 1,2. Given data y T one can either judge the 
performance of the two models sequentially (by comparing Ht to 1, for 2 < t < T) and 
thus arriving to a sequential monitoring of the two models, or use the entire data set y T to 
compare the models globally, e.g. one can extract the mean or other features of the empirical 
distribution of {Ht}. 



4.3 Convergence analysis 

Algorithm (|1Q|) is quite similar to the celebrated Kalman filter; conditional on a 2 , the algo- 
rithm exactly reduces to the Kalman filter, but the full algorithm allows for the estimation 
of a 2 that res ults in the Student t posterior distribution for Qf On the performance of the 
Kalman filter, 



Elliott et al. 



20051 ] state that the posterior covariance matrix of the param- 
eters converges to stable values and this has important implications on the stability of the 
state process {xt}. Indeed, it is well known that if the parameters of a state space model are 



constant, th en the posterio r covariance matri x of the stat e s con v erges to a stable va 



for instance, 



Harvey 



19891 . p. 119] as well as 



Chan et al 



19841 ] . 



Triantafyllopoulos 



uc: sec. 



2007bJ]. 
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However, the performance of the posterior covariance matrix Pt when the components of the 
model are made time-dependent has not been investigated; in our system this is conveyed via 
the time-varying vector Ft = (l,yt_i)'. This aspect is important as instability or divergence 
of Pt could result in instability of the estimation of At and P>t and hence of xt- The next 
result states that, in our system, Pt converges to stable values and we provide an explicit 
formula for the computation of the limit of Pt ■ 

Theorem 2. Suppose that {yt} is generated from model (0|). If {yt} is mean reverting and if 
for j = 1, 2, it is 5j < (fn, then as t — > oo the limit P of the covariance matrix Pt = Var(9t\y t ) 



exists and it is given by P = diag(pu , P22) , where 



with aif = 1 and a,2 t t = Vt-i- 



Some comments are in order. First we note that if 0i = <j>2 = 1 and Vt = (we have 
alr eady seen th a t this setting reduces the model to the time-invariant AR model considered 



m 



Elliott et al 



20051 ]) . then the condition 6j < $ is satisfied for all values of Sj, since 



0<8j < 1. 

From the mean reversion assumption of {yt}, if we write yt ~ where /1 denotes the 
equilibrium mean of the spread, then we can write the limit covariance matrix P as 



P 



^ 2 {<p\ -Si) 

M 02 2 (^i - &) 

In the important special case of <j)\ = S± = 1, for which At = A is time- invariant, we can 
easily see that 



P 



Pn,i 








\ 



E 



00 6 2 y ..2 



J 



where pn 5 i is the prior variance Var(^4). 

The convergence rate of the limit of Theorem[2]is geometric, since after some appropriately 
large ti, we can write yt ~ fJ,, for all t > ti and the limit of P depends on a geometric series. 

The above convergence results for Pt are given conditional on the variance a 2 . Given data 
up to time t, a 2 has a posterior inverted gamma distribution a 2 \y l ~ IG(rit/2,dt/2); hence, 
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as the time index gets larger, the variance of a 2 , which is given by 
Var(fjV) 



(m + t- l) 2 S 2 



(t > 5 - nx) 



(th - 2) 2 {n t - 4) 2 (nx + t - 3) 2 (?ii + t - 5) 

converges to 0. Therefore, as t — ► oo, cr 2 concentrates about its mode St = dt/ n t asymptoti- 
cally degenerating. 

4.4 Hyperparameter specification 



The estimation algorithm (|10p relies upon the specification of prior distributions and corre- 
sponding starting values (mi, Px, nx, dx) and values of the model components ((f>x, 4>2, $x, $2), 
which are selected by the user. In this brief section, considering weakly informative priors, 
we provide some guidance on how to choose these values. Of course, depending on the spe- 
cific application, other specifications may be preferred; for instance, the analyst may want t o 



include stronger prior beliefs regarding the spread being traded, see e.g. iKadane et al.l [1996]. 
Nevertheless, it is important to note that, given a reasonable amount of data, the sensitiv- 
ity of the calibration procedure on these initial specifications becomes negligible, especially 
over streaming data, because the initial information is de flated o yer time. This phenomeno n 



is discussed in some detail in 



Ameen and Harrison 



1984 and in 



Triantafvllopoulo! 



2007a ]. 



Detailed studies on pr i or sp e cification fo r the estimation of AR models can be found in 



Kadiyala and Karlsson 



19971 ] 



Ni and Sun 



20031 ] and in references therein. 



The parameter mi is the prior mean of the hidden state, given the observational variance, 
i.e. the mean of #i|cr 2 . A common choice is to set mi equal to our prior expectation of 
(Ax,Bi)', which may be obtained from the availability of historical data. In all examples of 
Section[5]we have used mi = (0, 0)'. This setting together with the vague prior Pi that follows, 
communicates a prior assumption of mean reversion, but with a large uncertainty placed a 
priori on (Ax, B\)' '. The convergence results reported in Theorem [2] above, guarantee that the 
choice of mi and Pi are not crucial for accurate estimation and forecasting. The covariance 
matrix Pi is chosen to be proportional to the 2x2 identity matrix, i.e. Pi = p\l2- Here a 
large value of px reflects a weakly informative or defuse prior specification, since in this case 
the precision P x _1 gets close to zero. Finally, values for nx and dx need to be provided. It 
can be noted that, having placed an inverted gamma prior on a 2 , the expected value of the 
observational variance is given by E(a 2 ) = dx/(n\ — 2), for nx > 2. Based on this observation, 
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a sensible choice is to set ii\ = 3 and use the prior expectation of a 2 as a starting value d\. 
Historical data may be used to specify di, but in the examples of Section [5] we have simply 
used d\ = 1. 

Proceeding now with the specification of <f>\ , <p2 , 5\ , 82 we can optimize these parameters 
by maximizing the log-likelihood function, given in Section 14.21 under the condition that 
Si < (pf so that Theorem [2] applies. Alternatively, according to Corollary [1] we can set 
(f>i = S\ = 1 and optimize only <f>2 and 82- In Section \5. 11 where we present simulation studies, 
we use the latter, while in Section 15.21 where we analyze real data, we use the former (full 
optimization of four parameters). We note that the likelihood function or Bayes factors can 
be used to compa re and optimize models u sing single discount factors 6\ = 82, known as 



single discounting 



West and Harrison! . 



19971 ] , and models using two different discount factors 



(component or multiple discounting). 



4.5 Pairs trading 



Under the assumption that the observed spread process involving two tradable assets is mean- 
reverting, and that the model of Eqs. ©-([3]) describes well its evolution at discrete observa- 
tional times t = fi , . . . , t/y, with N sufficiently small, a simple pairs trading strategy imme- 



diately follows 



Elliott et al 



2005]. Let us assume that xt denotes our best estimate of the 
hidden state, which is obtained by calibrating the model on data collected in the above data 
window. 

At each time t, if the observed spread yt is strictly greater than the true state xt, then 
a sensible decision would be to take a long position in this portfolio, with the intention 
of closing this position at a later time, when the spread has reverted back to its mean. 
Conversely, if yt < xt, the trader may decide to take a short position in the portfolio; this bet 
is expected to be a profitable one as soon as the spread process corrects itself again. Realistic 
implementations of this popular strategy may ask for additional layers of sophistication which 
in turn require the trader to face a few practical questions; some examples are: 

• How can transaction costs be included in this simple model? In other words, when 
is a trade expected to be profitable, so that an 'entry' signal can be generated? For 
instance, a long position could be initiated when yt — xt > Zt, where zt is a threshold 
that guarantees a profitable trade, after costs. The question then becomes, how should 

IS 



Zt be calibrated? For instance, IVidvamurthvl [20041 ] suggests a re-sampling procedure 
and provides some general guidance. There may exist several other alternative ways 
in which one could define entry points, perhaps based on empirical modeling of the 
extreme values of the yt — x± process. Theoret ical results on zero-crossing rates for 



autoregressive processes, as in 



Cheng et al. 



19951 ]. may also be explored. For aggr essive 



strategies that execute a trade at each single time tick, 



Montana et al 



20081 ] fore- 



casts the one-step ahead exp ected spread using dynamic regression methods, whereas 



Montana and Parrella 



20081 ] embrace the principle of "learning with experts" to deal 



with the uncertaincty involved in future movements on the spread. 

Analogously to the previous issue, how should an 'exit' signal be generated? And shall 
a trade be closed at an exit point, or simply reversed so that a long position becomes 
short, and viceversa? 

What stop-loss mechanism can be implemented to make sure that the assumptions on 
which the strategy relies are still satisfied? Surely, if the spread process is no longer 
believed to be mean-reverting, a stop-loss signal should be quickly generated. As will 
appear clearer later (see, for instance, the examples of Section [5]), our estimation proce- 
dure can be used to monitor mean-reversion sequentially and flag deviations from the 
acceptable behaviour of the spread process as soon a s they occur. Related co-integration 



arguments may also be used, as in 



Lin et al. 



2006 ]. 



How can suitable pairs of assets be chosen in the first place, especially when the universe 
of assets to search from is extremely large? Since arbitrage profits between two assets 
depend c ritically on the presenc e of a long-term equilibrium between them (see, for 



instance, 



Alexander et al. 



20021 ] ) , data mining metho ds built a round co-integration 



tech niques may be explored, as in 



and 



Pole 



d'Aspremont 



20081 ] . See also 



Vidvamurthv 



2004 ] 



20071 ] for alternative methods including simple correlation analysis, turning 



point analysis and latent factor models. 

As a final note, we mention a technique that may be deployed in a dynamic modeling 
setting, such as ours, to obtain the spread yt = Pt — PtPt^ in a recursive fashion. As noted 
before, the regression coefficient is usually estimated on historical data, but on-line proce- 
dures such as recursive least squares may also be used. The assumption of a time-invariant 
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regression coefficient (3 could also be released so as to allow (3 to change slightly over time; 
such a modification would capture a time-varying co-integration relationship between the two 
asset prices, where this extension deemed necessary. Assuming T historical observations, a 
regression model with a time-varying regression coefficient fit minimizes a cost function 

T T— l 

C(J3; K } " hp?} 2 + A* E(&+i " &) 2 ( U ) 

where [i > is a scalar determining how much penalization to place on temporal changes 
in the regression coefficient. When fi is very large, changes in the coefficient are penalized 
more heavily and, in the limit /x = oo, the usual OLS esti mate is recovered. A solution 
to the optimization problem above was originally proposed by 



Kalaba and Tesfatsion 



1988]. 



Following their approach, called flexible least squares (FLS), a recursive estimator for each (3t 
can easily be derived as 

-i 



$t= St_i + {p 2) } 2 



, (1) (2)1 

st-i +P t Pi J 



where we have defined the quantities 

St = {J> 
st = V 



^„ 1 + /i/ p + {pl 2) } 2 



t -i + te (2) } 2 } 

s^ + ^ + ipf^Yi^+pfV?} 



(12) 



(13) 



The r ecursions are initially started with some arbitrarily chosen values S± and s\. 



Montana et al. 



20091 ] show a clear algebraic connection between FLS and the Kalman filter and use this es- 



timation method to develop a dynamic statistical arbitrage strategy. 



5 Illustrations 

5.1 Simulated data 

In this section we initially report on a Monte Carlo simulation study demonstrating that the 
fast recursive algorithm (|10p described in Section 13.31 accurately estimates the parameters of 
the proposed model. We have simulated a large number of time series under model Q using 
a range of values for A, B and a 2 . The true parameters are kept constant in these initial simu- 
lations for simplicity, so they can be easily compared with the estimated posterior means. We 
have found that convergence to the true parameters A, B and a 2 is quickly achieved and the 
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estimated values of these parameters are not sensitive to the initial parameters fi\,Pi,ni, d\ 
(results not shown). 

We have also explored situations in which the parameters are time- varying. First, we have 
considered the case of a sudden change in the level of the spread; the time series fluctuates 
around an equilibrium level till t = 1500, and after that time it jumps to a much higher 
equilibrium. Clearly for 1 < t < 1499 the process is mean reverting, then at t = 1500 it looses 
mean reversion, but it retains it in the sub-period 1500 < t < 3000; of course the process 
is not mean reverted for the entire period 1 < t < 3000. Figure Q] shows how the posterior 
mean of \Bt\ is tracked using two different values of the discount factor 82 ■ Our focus is on 
monitoring Bt because, as established in Theorem [TJ this parameter is the ultimate object of 
interest. As shown in Figure [JJ the algorithm with 82 = 1 (which corresponds to a model with 
time-invariant parameters) does not manage to capture the loss of mean-reversion observed at 
time t = 1500; in fact the algorithm gives the misleading result of mean reversion throughout 
the time range. On the contrary, when using a smaller discount factor (which corresponds to a 
model with time- varying parameters), the algorithm tracks the jump almost in real-time and 
communicates the result that after t = 1500 the process has locally regained mean reversion. 

Furthermore, we have considered a more hypothetical scenario that may be of practical 
interest: Figure [2] corresponds to a scenario where Bt is piece-wise constant and undergoes a 
large sudden jump at time t = 1500. Again, the algorithm is able to track well mean reversion 
locally, although the true parameter Bt may not be estimated very accurately. 

FIGURES 1-2 AROUND HERE 



5.2 Equity data 

In this section we apply our methods to spreads obtained from historical equity data. Each 
spread is computed using the flexible least squares (FLS) method with a very large [i parame- 
ter; this is almost equivalent to ordinary least squares (OLS) regression but allows for recursive 
estimation. As a simple validation exercise, we also compare the findings obtained from our 
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model to formal cointegration tests which assume the availability of all data points. The very 
firs t procedure for the estim ation of cointegrating regressions, based on OLS, was proposed 



by 



Engle and Granger 



19871 ] . Since then several ot 



el uding the maximum likeliho o d method of 



of 



Phillips and Hanse 



n 



1990( 1 . 



Har greaves 



rer procedu res have been developed in- 



Johansen 



1988 



19911 ] and the fully modified OLS 



19941 ] lists eleven categories of procedures, and 



several more have been added in more recent ye ars. For our analysi s we h ave considere only 



three po pular tests: E ngle-Granger's ADF test 
PP test 



Perron 



Engle and Grangerl. 



198711 . Phillip s-Perron's 



Phillips and Ouliarisl . 
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19881 ] and Phillips-Ouliaris's PO test 
The first data sets we present consists of daily share prices of two companies: Exxon 
Mobil (XOM) and Southwest Airlines (LUV). We have used all the available data for this 
pair of stocks, which spans a period from March 23, 1980 to August 6, 2008. Figure reports 
the estimated posterior mean of Bt and its confidence band for the period March 23, 1980 
to November 30, 2004. Clearly, from March 23, 1980 till November 8, 2004 the posterior 
mean of \Bt\ stays below one, which according to Theorem [1] indicates mean-reversion of the 
spread time series. Figure [4] shows the observed spread time series as well as the estimated 
hidden state process and its posterior confidence band for this subperiod of the data. For the 
estimation of (A u B t )' we have used <\>\ = 0.1, fa = 99839, Si = 0.992 and S 2 = 0.995 that 
maximize the log- likelihood function All j> . 

When using all historical data (1980-2008), all three standard cointegration tests cannot 
reject the null hypothesis of unit roots (p-values: 0.246, 0.219 and 0.15). This is in agreement 
with the patterns captured by Figure El which reveals that after November 8, 2004, mean 
reversion is lost. However, when the analysis is restricted to the period November 8, 2004, 
both the PP and PO tests reject the null hypothesis of unit roots at a 5% significance level 
(p- values: 0.013 and 0.024, respectively). The ADF test, however, disagrees and does not 
reject the null hypothesis of unit roots {p- value 0.139). Thus, in this example, only two out 
of three tests agree with the evidence provided by our on-line monitoring device. 

Our second example illustrates a co-integration relationship existing between two ETFs 
operating in the commodity market. ETFs are relatively new financial instruments that have 
exploded in popularity over the last few years. They are securities that combine elements of 
both index funds and stocks: like index funds, they are pools of securities that track specific 
market indexes at a very low cost; like stocks, they are traded on major stock exchanges and 



22 



can be bought and sold anytime during normal trading hours. We have collected historical 
time series for the SPDR Gold Shares (GLD) and Market Vectors Gold Miners (GDX) ETFs. 
GLD is an ETF that tries to reflect the performance of the price of gold bullion, whereas 
GDX tries to replicate as closely as possible, before fees and expenses, the price and yield 
performance of the AMEX Gold Miners index. This is achieved by investing in all of the 
securities which comprise the index (in proportions given by their weighting in the index). 
This analysis is based upon all the historical data available for the pair, which covers a shorted 
period compared to the previous example, from May 23, 2006 until August 06, 2008. Figure 
[5] shows the observed spread process jointly with the estimated hidden process and confidence 
bands, while Figure [6] indicates that a co-integrating relationship between the two ETFs does 
exist in the period from July 19, 2006 till 17 December, 2007. For this data set we have used 
<pi = 0.999, 4>2 = 99, Si = 0.95 and 62 = 0.98 that maximize the log-likelihood function (fTTj) . 

When all the historical data is used, the ADF and the PP tests indicate the presence of co- 
integration at a 5% significance level (p-values: 0.01 and 0.01, respectively) and only the PO 
test suggest lack of co-integration, a result that also agrees with the pattern reported in Figure 
[6l Considering the period July 19, 2006 till 17 December, 2007, for which our results suggest 
mean reversion, we find that all three tests also suggest co-integration (p-values: 0.0201, 0.013 
and 0.012). Further formal comparisons and more detailed studies will be needed in order 
to characterize some of the discrepancies; however, based on this empirical evidence, our 
suggested time- varying model seems to generally agree with most formal cointegration tests. 



FIGURES 3-6 AROUND HERE 



6 Conclusions 

In this paper we have proposed a Bayesian time-varying autoregressive model, expressed in 
time-space form, and an efficient recursive algorithm based on forgetting or discount factors. 
The procedure can be used for real-time estimation and tracking of the underlying spread 
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process and may be seen as a more efficient alternative to standard iterative MLE procedures 
such as the EM algorithm. Conditions for mean-reversion as well as the convergence properties 
of the on-line estimation algorithm have been studied analytically and discussed. The model 
seems particularly useful for monitoring mean-reversion using financial data streams and as 
a building block for statistical arbitrage strategies such as pairs trading. Related algorithmic 
trading strategies that exploit co- integration of financial instru ments, for instance ind ex ar- 



bitrage 



Sutcliffe and Board! . 



2006] and enhanced index tracking 



Alexander et al 



20021 ] . may 



also benefit from the methods proposed here. Moreover, although the focus of this work 
has been on applications in computational finance, we believe that the methods described 
here are of broader interest and may appeal to other users, within the management science 
community, who need to model and monitor mean-reverting time series arising in different 
application domains 

There are several aspects of the suggested methodology that we would like to explore 
further in future work. First, purely from an empirical point of view, we would like to better 
understand how the methodology relates to more formal statistical procedures for testing 
the hypothesis of mean reversion based on finite sample sizes. As already mentioned, since 
mean-reversion is closely linked to second order stationarity, many efforts have been directed 
to constructing unit root tests. These standard econometric procedures may lack the power 
to reject the null hypothesis of a random walk, and we feel that our method may at least 
complement them well. Beside s, some of the rec ently suggested procedures, such as the 
bootstrap methods described by 



Li and Xiao 



20031 ] . are too computationally expensive to be 



of any use in the real settings and applications that we have described. Another important 
aspect that we plan to investigate is the question of how to learn the discounting factors needed 
to specify the Vt matrix in a more adaptive fashion, so that they become self-tuning, rather 
than being kept constant at all times. A number of techniques have been successfully used for 
training adaptive artificial neural networks and ot her time- varying stochastic processes using 
forgetting factors 



Saadl . 



1999 



Niedzwiecki 



20001 ] and there may be scope for improvement 



along this direction. 
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Appendix 

Proof of TheoremUl With fa = fa = 1 and Vt = 0, the state space model ([6]) reduces to the 
AR model yt = A + Byt~\ + e^, where At = A and Bt = B and it is trivial to verify that {yt} 
is mean reverting if \B\\ < 1, see also Section [2j This completes (a). 

Proceeding now to (b), from the AR model for At we note that E(At) = 0. From ([6]) 
write yt recursively as 

y t = A t + Btyt-i + e t = A t + B t A t ^ + B t B t - iyt - 2 + B t e t ^i + e t = ■ ■ ■ 

t t-3 j t-3 j 

= yi n B i + s n ^-^t-j-i + a * + s n ^^3-1 + ^ 

i=2 j=0 i=0 j=0 i=0 

We write ^4* = (A\, . . . , A t ) and B l = . . . , S t ), for t = 1, . . . , T. Since {et} is white noise, 
we have 

t 

JSdte|J3*) =yi JJJ3i (A-l) 

This is a convergent series if |J3t| < 1, for all t > to, for some positive integer to- To see 
this first write xf' = Yll^^i, which is a decreasing series as \x^ x fx^\ = \B t+ i\ < 1. Also 
{xj 1 ^} is bounded as \x[^\ = Y[l=2 l-^l < 1 anc ^ so { x t^} i s convergent. 

For the variance of yt we have 

t-3 j t-3 j 

VarfolS*) = Var(^)+^n^ Var ^-0 + En B t- Var ( e *-i- 1 ) 

j=0 i=0 j'=0 i=0 

t-3 j 

+Var(e t ) + ^ [] 5 t ^Cov(A t , A t ^ x ) 

j=0 i=0 



< a 2 



vi \ vi / ._ n ,•— n vi -_ n -_ n 



j=0 i=0 rl j=0 i=0 

where it is used that 

Var(^)<f% and Cw(A t , A t ^) < f% 

l-0f 1-01 
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for Vn s t < Vn, since from the hypothesis Vt is bounded, and so there exists some V\\ > so 
that Vu,t < Vu. 

Now we show that the series x\ = Sj=o T\i=o ^t-i anc ^ x t = Sj=o 111=0 &t-i are 
both convergent. For the former series we note that given \Bt\ < 1, we can find some B so 
that \Bt\ < \B\ < 1, from which it follows that 

t-3 j t-3 3 t-3 

i-! 2) i<Em^^En^i=Ei^ +i 

j=0 i=0 j=0 i=0 j=0 

(2) 

which is proportional to a geometric series that converges for \B\ < 1 and since x\ is a 

(2) 

positive series, it follows that {x\ } is convergent. 

(3) 

For the series x\ , we follow an analogous argument, i.e. for B satisfying |J3 t | < \B\ < 1 
we obtain 

K (3) l<El^P' +1 

3=0 

which shows that x^ is convergent as X^=o I01^P +1 i s a geometric series with \<j)\B\ < 1 
and is a positive series. 

With these convergence results in place, the convergence of Vsx(yt\B t ) is obvious. Given, 
I?*, we have shown that the mean and the variance of {yt} are convergent and so {yt} is mean 
reverting. □ 

Proof of TheoremlM From the diagonal structure of Pf = diag(pn.(,p22,t) and the updating 
of Pf as in the calibration algorithm (|10p we have 

4>iPu,t-i aittPiPiij-i^ 2 _ <j> 2 Pii,t-i f, a it 4> 2 pii^i 



Si ait$,Pa,t-\h 1 + 1 S i \ $i + ait4>1pu,t-i 

4$Pii,t-l 



^ + au(f)iPu,t-i 
We can clearly see that pa t > 0, for all t and so we have 

Pu,t 4>tPu,t-i <t>f 2 pu,i j^\4>V 

Now since 5{(j)~ 2 ^ ctij-j is a positive sequence and since from the mean reversion of {yt} and 
the definition of an, the above sequence is bounded above by the geometric sequence 
where M is an upper bound of {yt}, it follows immediately that Ylj=o < P~ 2 ' > a i,t-j < °o and 
this proves that pa tt converges to X^=o 4>~ 2 ^ a i,t-j ■ The proof is completed by inverting 
(O, noting that p ii)t > and £^L & !° ~ J "i.f J > °- D 
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Posterior Estimation of |B t | 
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Figure 1: Estimation of \Bt\ for the simulated spread with a jump at t = 1500. We have 
chosen a prior Pi = 1000/2- A value of 5\ = 62 = S = 1, which corresponds to the adoption 
of a time-invariant model, fails to capture mean-reversion following immediately after the 
change of equilibrium ar time t = 1501. However, forgetting factors Si = 1 and 62 = S = 0.98 
tracks the abrupt change in mean level and and following quick restoration of mean-reversion. 
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Posterior Estimation of |B t | 
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Figure 2: Estimation of abruptly varying Bt; shown is the posterior mean of \Bt\. The real 
parameters are A = 0.2, B t = 0.25 and a 2 = 1, for 1 < t < 1500; A = 0.2, B t = 1 and a 2 = 1, 
for 1501 < t < 3000. We have chosen a prior Pi = IOOO/2 and 5\ = 1. Two selected values of 
62 = 5 are used, 5 = 1 and 5 = 0.98. 
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XOM-LUV Spread: Posterior Estimation of |B t | 
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XOM-LUV Spread 
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Figure 4: Observed spread and state spread using a recursive regression routine for on-line 
spread availability. 
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Figure 5: Observed spread and state spread using a recursive regression routine for on-line 
spread availability. 
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Figure 6: Posterior estimation of \B t \. We have used 4>i = 0.999, 4>2 = 99, Si = 0.95, 62 = 0.98 
and a prior Pi = 1000/2- 
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